Skip to content

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Oct 23, 2025

@xingyaoww I'll fix the conflicts etc, but I'd love your attention for a bit here: this PR simplifies things in agent-sdk right now, because we actually lost the recognition of the "claude-x" in model names, when the provider had a more complex name than just "anthropic/". In other words, we lost bedrock match. This fixes it.

The cause for that is that we changed at some point in the recent past, from simple match of the core name ("claude-3-5") anywhere in the full provider/model (which is accurate! No Llama will call itself "claude-3-5" 😅), to some code using globbing normalization which forced us to add half a million patterns to account for the variety of forms in which "claude-3-5" is included out there - and it keeps missing some, of course.

I looked into it and I believe that was a mistake I made. This is a revert.

This fixes some reports on slack on Bedrock. I'd love to merge this, to stop the whack-a-mole, rather than hardcoding more stuff in these patterns. I think maybe we can also take them out, but I would still love this revert first.


Summary

  • Return to simple core family substring matching across the full raw model string
  • Remove fnmatch/globbing and stop using normalization for feature detection
  • Update pattern tables to pure substrings (no wildcards)
  • Add tests to validate e.g. Bedrock-style names (the most messy)

What & Why

This PR restores the durable invariant:

  • if a meaningful family token (e.g., 'claude-3-5-sonnet', 'gpt-4o', 'o3', 'gemini-2.5-pro') appears anywhere in the model string
    => the feature applies.

"bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0" => "claude-3-5-sonnet" applies.

This eliminates the pattern maintenance whack-a-mole caused by dotted prefixes and provider-specific suffixes and aligns again with proven behavior in the wild.

Recent refactor introduced fnmatch-based globbing over a normalized basename. This unintentionally diverged from the prior V0 behavior where we effectively matched by substring on the full provider/model name. That change broke real-world cases, notably with AWS Bedrock where names embed dotted vendor prefixes and version suffixes inside the basename (e.g., bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0). Our patterns like 'claude-3-5-sonnet*' stopped matching after normalization and globbing.

Implementation Details

  1. model_matches()

    • Lowercase + strip the incoming model string and perform case-insensitive substring checks on the full raw string
    • For each pattern, lowercase/strip and drop any trailing '*' (migration aid); treat the remaining token as a plain substring
    • Return True on first match; False otherwise
    • No use of normalize_model_name() here
  2. Pattern tables: remove '*'

    • FUNCTION_CALLING_PATTERNS, REASONING_EFFORT_PATTERNS, PROMPT_CACHE_PATTERNS, SUPPORTS_STOP_WORDS_FALSE_PATTERNS, RESPONSES_API_PATTERNS now contain pure substrings
    • Provider-qualified entries remain supported by virtue of substring matching against the raw string
  3. normalize_model_name()

    • Not used by matching. Tests exercising normalization for matching were removed to avoid confusion
  4. Tests

    • Remove wildcard expectations; adapt to pure substring semantics
    • Ensure Bedrock coverage: e.g., 'bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0' enables function calling and prompt cache
    • Verify provider-qualified substrings gate as expected (e.g., 'openai/gpt-4o' matches 'openai/gpt-4o' but not 'anthropic/*')
    • Keep conservative defaults for unknown models

Outcomes

  • Clear behavior: if the essential family token appears in the model string, the feature applies
  • Fewer special-case patterns and more durable matching across providers
  • Restores pre-refactor semantics that worked reliably in practice

Checklist

  • Code formatted and linted via pre-commit
  • Updated tests for sdk changes; all impacted sdk tests pass locally

Closes #844

@enyst can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Base Image Docs / Tags
golang golang:1.21-bookworm Link
java eclipse-temurin:17-jdk Link
python nikolaik/python-nodejs:python3.12-nodejs22 Link

Pull (multi-arch manifest)

docker pull ghcr.io/openhands/agent-server:45b3ece-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-45b3ece-python \
  ghcr.io/openhands/agent-server:45b3ece-python

All tags pushed for this build

ghcr.io/openhands/agent-server:45b3ece-golang
ghcr.io/openhands/agent-server:v1.0.0a5_golang_tag_1.21-bookworm_binary
ghcr.io/openhands/agent-server:45b3ece-java
ghcr.io/openhands/agent-server:v1.0.0a5_eclipse-temurin_tag_17-jdk_binary
ghcr.io/openhands/agent-server:45b3ece-python
ghcr.io/openhands/agent-server:v1.0.0a5_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary

The 45b3ece tag is a multi-arch manifest (amd64/arm64); your client pulls the right arch automatically.

Cross-repo impact: Fix: OpenHands/OpenHands#11248

…normalize usage

- model_matches now does case-insensitive substring on full raw model
- strip trailing '*' in patterns (migration aid)
- pattern tables converted to plain substrings (no '*')
- drop normalize_model_name and related tests
- update tests to reflect substring semantics and Bedrock coverage

Fixes #844

Co-authored-by: openhands <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 23, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL11093498855% 
report-only-changed-files is enabled. No files were changed during this commit :)

@enyst enyst marked this pull request as draft October 23, 2025 18:51
enyst and others added 4 commits October 23, 2025 18:53
…handling and empty-token skipping

- Patterns are now used exactly as provided (lowercased/stripped)
- No special handling for '*' or empty tokens

Co-authored-by: openhands <[email protected]>
…eature detection

- Validate provider-prefixed Bedrock ids and plain vendor-prefixed names
- Ensure function-calling and prompt-cache features are enabled for Claude families

Co-authored-by: openhands <[email protected]>
…edrock dotted vendor prefixes

- Function-calling: adds claude-sonnet-4-5 and claude-sonnet-4.5, and us.anthropic.* examples
- Prompt cache: keep only supported families; drop unsupported haiku-4.5 dotted vendor case

Co-authored-by: openhands <[email protected]>
… extend tests with dotted vendor forms

- Add claude-haiku-4.5 and claude-haiku-4-5 to PROMPT_CACHE_PATTERNS
- Expand tests for us.anthropic.* and local names for Haiku 4.5

Co-authored-by: openhands <[email protected]>
@enyst enyst added the integration-test Runs the integration tests and comments the results label Oct 23, 2025
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

2 similar comments
@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions

This comment was marked as outdated.

@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Nov 1, 2025

[Automatic Post]: It has been a while since there was any activity on this PR. @enyst, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

@enyst
Copy link
Collaborator Author

enyst commented Nov 1, 2025

@OpenHands Merge main into this PR and fix the conflicts.

@openhands-ai
Copy link

openhands-ai bot commented Nov 1, 2025

Uh oh! There was an unexpected error starting the job :(

@enyst enyst added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Nov 1, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2025

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 1, 2025

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.84
Models Tested: 3
Timestamp: 2025-11-01 14:03:24 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Success Rate Tests Passed Total Tests Cost
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 7/7 7 $0.73
litellm_proxy_gpt_5_mini_2025_08_07 100.0% 7/7 7 $0.04
litellm_proxy_deepseek_deepseek_chat 100.0% 7/7 7 $0.07

📋 Detailed Results

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.73
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_602f6bd_sonnet_run_N7_20251101_135909

litellm_proxy_gpt_5_mini_2025_08_07

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.04
  • Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_602f6bd_gpt5_mini_run_N7_20251101_135914

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.07
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_602f6bd_deepseek_run_N7_20251101_135910

@enyst enyst marked this pull request as ready for review November 1, 2025 16:42
@enyst enyst requested a review from xingyaoww November 1, 2025 16:42
Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! thank you

enyst and others added 5 commits November 1, 2025 22:25
This updates the direct completion access to use resp.message and extract TextContent, aligning with the current LLMResponse interface.

Co-authored-by: openhands <[email protected]>
…ring matching semantics and add function_calling support flag

- Keep substring-based model_matches per PR #879 direction
- Restore function calling patterns and features
- Align tests to substring semantics and function_calling expectations

Co-authored-by: openhands <[email protected]>
…ing semantics

- Remove supports_function_calling field and patterns
- Remove related tests
- Keep substring-based model_matches; keep other feature flags intact

Co-authored-by: openhands <[email protected]>
…mini-latest' and gpt-5 family

Co-authored-by: openhands <[email protected]>
@openhands-ai
Copy link

openhands-ai bot commented Nov 1, 2025

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #879 at branch `openhands/revert-to-substring-matching`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@enyst enyst enabled auto-merge (squash) November 2, 2025 10:34
@enyst enyst merged commit 21c4d27 into main Nov 2, 2025
13 checks passed
@enyst enyst deleted the openhands/revert-to-substring-matching branch November 2, 2025 10:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

3 participants